1,738 research outputs found

    Monte Carlo likelihood inference for missing data models

    Full text link
    We describe a Monte Carlo method to approximate the maximum likelihood estimate (MLE), when there are missing data and the observed data likelihood is not available in closed form. This method uses simulated missing data that are independent and identically distributed and independent of the observed data. Our Monte Carlo approximation to the MLE is a consistent and asymptotically normal estimate of the minimizer θ∗\theta^* of the Kullback--Leibler information, as both Monte Carlo and observed data sample sizes go to infinity simultaneously. Plug-in estimates of the asymptotic variance are provided for constructing confidence regions for θ∗\theta^*. We give Logit--Normal generalized linear mixed model examples, calculated using an R package.Comment: Published at http://dx.doi.org/10.1214/009053606000001389 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Rare variant analysis of blood pressure phenotypes in the Genetic Analysis Workshop 18 whole genome sequencing data using sequence kernel association test

    Get PDF
    Sequence kernel association test (SKAT) has become one of the most commonly used nonburden tests for analyzing rare variants. Performance of burden tests depends on the weighting of rare and common variants when collapsing them in a genomic region. Using the systolic and diastolic blood pressure phenotypes of 142 unrelated individuals in the Genetic Analysis Workshop 18 data, we investigated whether performance of SKAT also depends on the weighting scheme. We analyzed the entire sequencing data for all 200 replications using 3 weighting schemes: equal weighting, Madsen-Browning weighting, and SKAT default linear weighting. We considered two options: all single-nucleotide polymorphisms (SNPs) and only low-frequency SNPs. A SKAT default weighting scheme (which heavily downweights common variants) performed better for the genes in which causal SNPs are mostly rare. This SKAT default weighting scheme behaved similarly to other weighting schemes after eliminating all common SNPs. In contrast, the equal weighting scheme performed the best for MAP4 and FLT3, both of which included a common variant with a large effect. However, SKAT with all 3 weighting schemes performed poorly. Overall power across all causal genes was about 0.05, which was almost identical to the type I error rate. This poor performance is partly due to a small sample size because of the need to analyze only unrelated individuals. Because a half of causal SNPs were not found in the annotation file based on the 1000 Genomes Project, we suspect that performance was also affected by our use of incomplete annotation information

    An empirical comparison of meta-analysis and mega-analysis of individual participant data for identifying gene-environment interactions

    Get PDF
    pre-printMeta-analysis combining results from multiple studies is a standard practice in GWAS. For genetic main effects, meta-analysis has been shown to provide comparable results as mega-analysis that jointly analyzes the pooled data from the available studies. Gene-environment interaction (GEI) studies are an important component of genetic epidemiology research since they can explain a part of the missing heritability, elucidate the biological networks underlying disease risk, and identify individuals at high risk for disease. However, it is not known whether meta- and mega-analyses of interactions also yield comparable results. In this study, we investigate whether both approaches provide comparable results for identifying interaction effects using empirical data from 4 studies: the Framingham Heart Study, GENOA, HERITAGE and HyperGEN. We performed meta-analysis of cohort-specific results and mega-analysis by analyzing the pooled data from all 4 studies. We used the standard 1 degree of freedom (df) test of main effect only, the 1 df test of the interaction effect (in the presence of main effect), and the joint 2 df test of main and interaction effects. We found that the results from meta- and mega-analyses were highly consistent for all three tests. The correlation between -log (p) values from the two analyses was 0.89 for the 1 df main effect, 0.90 for the 1 df interaction test, and 0.91 for the joint 2 df test. Although mega-analysis provided slightly better results as expected, both yielded very similar results for the most promising SNPs. Moreover, mega-analysis is not always feasible especially in very large and diverse consortia since pooling of raw data may be limited by the terms of the informed consent. Our study illustrates that meta-analysis can be an effective approach also for identifying interactions in very large consortia without losing appreciable power

    Identification of Genetic Association of Multiple Rare Variants Using Collapsing Methods

    Get PDF
    Next-generation sequencing technology allows investigation of both common and rare variants in humans. Exomes are sequenced on the population level or in families to further study the genetics of human diseases. Genetic Analysis Workshop 17 (GAW17) provided exomic data from the 1000 Genomes Project and simulated phenotypes. These data enabled evaluations of existing and newly developed statistical methods for rare variant sequence analysis for which standard statistical methods fail because of the rareness of the alleles. Various alternative approaches have been proposed that overcome the rareness problem by combining multiple rare variants within a gene. These approaches are termed collapsing methods, and our GAW17 group focused on studying the performance of existing and novel collapsing methods using rare variants. All tested methods performed similarly, as measured by type I error and power. Inflated type I error fractions were consistently observed and might be caused by gametic phase disequilibrium between causal and noncausal rare variants in this relatively small sample as well as by population stratification. Incorporating prior knowledge, such as appropriate covariates and information on functionality of SNPs, increased the power of detecting associated genes. Overall, collapsing rare variants can increase the power of identifying disease-associated genes. However, studying genetic associations of rare variants remains a challenging task that requires further development and improvement in data collection, management, analysis, and computation
    • …
    corecore